43 research outputs found

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Data Integration for Heterogenous Datasets

    No full text

    Modeling Topic Diffusion in Multi-Relational Bibliographic Information Networks

    No full text
    Information diffusion has been widely studied in networks, aim-ing to model the spread of information among objects when they are connected with each other. Most of the current research as-sumes the underlying network is homogeneous, i.e., objects are of the same type and they are connected by links with the same se-mantic meanings. However, in the real word, objects are connected via different types of relationships, forming multi-relational hetero-geneous information networks. In this paper, we propose to model information diffusion in such multi-relational networks, by distinguishing the power in passing information around for different types of relationships. We propose two variations of the linear threshold model for multi-relational net-works, by considering the aggregation of information at either the model level or the relation level. In addition, we use real diffu-sion action logs to learn the parameters in these models, which will benefit diffusion prediction in real networks. We apply our diffu-sion models in two real bibliographic information networks, DBLP network and APS network, and experimentally demonstrate the ef-fectiveness of our models compared with single-relational diffusion models. Moreover, our models can determine the diffusion power of each relation type, which helps us understand the diffusion pro-cess better in the multi-relational bibliographic network scenario

    Building a Cross-Language Entity Linking Collection in Twenty-One Languages

    No full text

    ROSeAnn

    No full text

    CommandSpace

    No full text
    corecore